Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 277
Filtrar
1.
PLOS Digit Health ; 3(4): e0000484, 2024 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-38620037

RESUMO

Few studies examining the patient outcomes of concurrent neurological manifestations during acute COVID-19 leveraged multinational cohorts of adults and children or distinguished between central and peripheral nervous system (CNS vs. PNS) involvement. Using a federated multinational network in which local clinicians and informatics experts curated the electronic health records data, we evaluated the risk of prolonged hospitalization and mortality in hospitalized COVID-19 patients from 21 healthcare systems across 7 countries. For adults, we used a federated learning approach whereby we ran Cox proportional hazard models locally at each healthcare system and performed a meta-analysis on the aggregated results to estimate the overall risk of adverse outcomes across our geographically diverse populations. For children, we reported descriptive statistics separately due to their low frequency of neurological involvement and poor outcomes. Among the 106,229 hospitalized COVID-19 patients (104,031 patients ≥18 years; 2,198 patients <18 years, January 2020-October 2021), 15,101 (14%) had at least one CNS diagnosis, while 2,788 (3%) had at least one PNS diagnosis. After controlling for demographics and pre-existing conditions, adults with CNS involvement had longer hospital stay (11 versus 6 days) and greater risk of (Hazard Ratio = 1.78) and faster time to death (12 versus 24 days) than patients with no neurological condition (NNC) during acute COVID-19 hospitalization. Adults with PNS involvement also had longer hospital stay but lower risk of mortality than the NNC group. Although children had a low frequency of neurological involvement during COVID-19 hospitalization, a substantially higher proportion of children with CNS involvement died compared to those with NNC (6% vs 1%). Overall, patients with concurrent CNS manifestation during acute COVID-19 hospitalization faced greater risks for adverse clinical outcomes than patients without any neurological diagnosis. Our global informatics framework using a federated approach (versus a centralized data collection approach) has utility for clinical discovery beyond COVID-19.

2.
Sci Rep ; 14(1): 8021, 2024 04 05.
Artigo em Inglês | MEDLINE | ID: mdl-38580710

RESUMO

The Phenome-Wide Association Study (PheWAS) is increasingly used to broadly screen for potential treatment effects, e.g., IL6R variant as a proxy for IL6R antagonists. This approach offers an opportunity to address the limited power in clinical trials to study differential treatment effects across patient subgroups. However, limited methods exist to efficiently test for differences across subgroups in the thousands of multiple comparisons generated as part of a PheWAS. In this study, we developed an approach that maximizes the power to test for heterogeneous genotype-phenotype associations and applied this approach to an IL6R PheWAS among individuals of African (AFR) and European (EUR) ancestries. We identified 29 traits with differences in IL6R variant-phenotype associations, including a lower risk of type 2 diabetes in AFR (OR 0.96) vs EUR (OR 1.0, p-value for heterogeneity = 8.5 × 10-3), and higher white blood cell count (p-value for heterogeneity = 8.5 × 10-131). These data suggest a more salutary effect of IL6R blockade for T2D among individuals of AFR vs EUR ancestry and provide data to inform ongoing clinical trials targeting IL6 for an expanding number of conditions. Moreover, the method to test for heterogeneity of associations can be applied broadly to other large-scale genotype-phenotype screens in diverse populations.


Assuntos
Diabetes Mellitus Tipo 2 , Humanos , Diabetes Mellitus Tipo 2/tratamento farmacológico , Diabetes Mellitus Tipo 2/genética , Estudos de Associação Genética , Fenótipo , Polimorfismo de Nucleotídeo Único , Receptores de Interleucina-6/genética
3.
Biometrics ; 80(1)2024 Jan 29.
Artigo em Inglês | MEDLINE | ID: mdl-38465982

RESUMO

In many modern machine learning applications, changes in covariate distributions and difficulty in acquiring outcome information have posed challenges to robust model training and evaluation. Numerous transfer learning methods have been developed to robustly adapt the model itself to some unlabeled target populations using existing labeled data in a source population. However, there is a paucity of literature on transferring performance metrics, especially receiver operating characteristic (ROC) parameters, of a trained model. In this paper, we aim to evaluate the performance of a trained binary classifier on unlabeled target population based on ROC analysis. We proposed Semisupervised Transfer lEarning of Accuracy Measures (STEAM), an efficient three-step estimation procedure that employs (1) double-index modeling to construct calibrated density ratio weights and (2) robust imputation to leverage the large amount of unlabeled data to improve estimation efficiency. We establish the consistency and asymptotic normality of the proposed estimator under the correct specification of either the density ratio model or the outcome model. We also correct for potential overfitting bias in the estimators in finite samples with cross-validation. We compare our proposed estimators to existing methods and show reductions in bias and gains in efficiency through simulations. We illustrate the practical utility of the proposed method on evaluating prediction performance of a phenotyping model for rheumatoid arthritis (RA) on a temporally evolving EHR cohort.


Assuntos
Aprendizado de Máquina , Aprendizado de Máquina Supervisionado , Humanos , Curva ROC , Projetos de Pesquisa , Viés
4.
J Am Med Inform Assoc ; 31(5): 1126-1134, 2024 Apr 19.
Artigo em Inglês | MEDLINE | ID: mdl-38481028

RESUMO

OBJECTIVE: Development of clinical phenotypes from electronic health records (EHRs) can be resource intensive. Several phenotype libraries have been created to facilitate reuse of definitions. However, these platforms vary in target audience and utility. We describe the development of the Centralized Interactive Phenomics Resource (CIPHER) knowledgebase, a comprehensive public-facing phenotype library, which aims to facilitate clinical and health services research. MATERIALS AND METHODS: The platform was designed to collect and catalog EHR-based computable phenotype algorithms from any healthcare system, scale metadata management, facilitate phenotype discovery, and allow for integration of tools and user workflows. Phenomics experts were engaged in the development and testing of the site. RESULTS: The knowledgebase stores phenotype metadata using the CIPHER standard, and definitions are accessible through complex searching. Phenotypes are contributed to the knowledgebase via webform, allowing metadata validation. Data visualization tools linking to the knowledgebase enhance user interaction with content and accelerate phenotype development. DISCUSSION: The CIPHER knowledgebase was developed in the largest healthcare system in the United States and piloted with external partners. The design of the CIPHER website supports a variety of front-end tools and features to facilitate phenotype development and reuse. Health data users are encouraged to contribute their algorithms to the knowledgebase for wider dissemination to the research community, and to use the platform as a springboard for phenotyping. CONCLUSION: CIPHER is a public resource for all health data users available at https://phenomics.va.ornl.gov/ which facilitates phenotype reuse, development, and dissemination of phenotyping knowledge.


Assuntos
Registros Eletrônicos de Saúde , Fenômica , Fenótipo , Bases de Conhecimento , Algoritmos
5.
Biometrics ; 80(1)2024 Jan 29.
Artigo em Inglês | MEDLINE | ID: mdl-38386359

RESUMO

In clinical studies of chronic diseases, the effectiveness of an intervention is often assessed using "high cost" outcomes that require long-term patient follow-up and/or are invasive to obtain. While much progress has been made in the development of statistical methods to identify surrogate markers, that is, measurements that could replace such costly outcomes, they are generally not applicable to studies with a small sample size. These methods either rely on nonparametric smoothing which requires a relatively large sample size or rely on strict model assumptions that are unlikely to hold in practice and empirically difficult to verify with a small sample size. In this paper, we develop a novel rank-based nonparametric approach to evaluate a surrogate marker in a small sample size setting. The method developed in this paper is motivated by a small study of children with nonalcoholic fatty liver disease (NAFLD), a diagnosis for a range of liver conditions in individuals without significant history of alcohol intake. Specifically, we examine whether change in alanine aminotransferase (ALT; measured in blood) is a surrogate marker for change in NAFLD activity score (obtained by biopsy) in a trial, which compared Vitamin E ($n=50$) versus placebo ($n=46$) among children with NAFLD.


Assuntos
Hepatopatia Gordurosa não Alcoólica , Criança , Humanos , Hepatopatia Gordurosa não Alcoólica/diagnóstico , Biomarcadores , Biópsia , Tamanho da Amostra
6.
Patterns (N Y) ; 5(1): 100906, 2024 Jan 12.
Artigo em Inglês | MEDLINE | ID: mdl-38264714

RESUMO

Electronic health record (EHR) data are increasingly used to support real-world evidence studies but are limited by the lack of precise timings of clinical events. Here, we propose a label-efficient incident phenotyping (LATTE) algorithm to accurately annotate the timing of clinical events from longitudinal EHR data. By leveraging the pre-trained semantic embeddings, LATTE selects predictive features and compresses their information into longitudinal visit embeddings through visit attention learning. LATTE models the sequential dependency between the target event and visit embeddings to derive the timings. To improve label efficiency, LATTE constructs longitudinal silver-standard labels from unlabeled patients to perform semi-supervised training. LATTE is evaluated on the onset of type 2 diabetes, heart failure, and relapses of multiple sclerosis. LATTE consistently achieves substantial improvements over benchmark methods while providing high prediction interpretability. The event timings are shown to help discover risk factors of heart failure among patients with rheumatoid arthritis.

7.
Stud Health Technol Inform ; 310: 649-653, 2024 Jan 25.
Artigo em Inglês | MEDLINE | ID: mdl-38269889

RESUMO

Several studies have shown that about 80% of the medical information in an electronic health record is only available through unstructured data. Resources such as medical terminologies in languages other than English are limited and restrain the NLP tools. We propose here to leverage English based resources in other languages using a combination of translation, word alignment, entity extraction and term normalization (TAXN). We implement this extraction pipeline in an open-source library called "medkit". We demonstrate the interest of this approach through a specific use-case: enriching a phenotypic dictionary for post-acute sequelae in COVID-19 (PASC). TAXN proved to be efficient to propose new synonyms of UMLS terms using a corpus of 70 articles in French with 356 terms enriched with at least one validated new synonym. This study was based on freely available deep-learning models.


Assuntos
Multilinguismo , Humanos , Idioma , Progressão da Doença , Registros Eletrônicos de Saúde
8.
Med Care ; 62(2): 102-108, 2024 Feb 01.
Artigo em Inglês | MEDLINE | ID: mdl-38079232

RESUMO

BACKGROUND: There is tremendous interest in evaluating surrogate markers given their potential to decrease study time, costs, and patient burden. OBJECTIVES: The purpose of this statistical workshop article is to describe and illustrate how to evaluate a surrogate marker of interest using the proportion of treatment effect (PTE) explained as a measure of the quality of the surrogate marker for: (1) a setting with a general fully observed primary outcome (eg, biopsy score); and (2) a setting with a time-to-event primary outcome which may be censored due to study termination or early drop out (eg, time to diabetes). METHODS: The methods are motivated by 2 randomized trials, one among children with nonalcoholic fatty liver disease where the primary outcome was a change in biopsy score (general outcome) and another study among adults at high risk for Type 2 diabetes where the primary outcome was time to diabetes (time-to-event outcome). The methods are illustrated using the Rsurrogate package with a detailed R code provided. RESULTS: In the biopsy score outcome setting, the estimated PTE of the examined surrogate marker was 0.182 (95% confidence interval [CI]: 0.121, 0.240), that is, the surrogate explained only 18.2% of the treatment effect on the biopsy score. In the diabetes setting, the estimated PTE of the surrogate marker was 0.596 (95% CI: 0.404, 0.760), that is, the surrogate explained 59.6% of the treatment effect on diabetes incidence. CONCLUSIONS: This statistical workshop provides tools that will support future researchers in the evaluation of surrogate markers.


Assuntos
Diabetes Mellitus Tipo 2 , Criança , Humanos , Resultado do Tratamento , Biomarcadores
9.
medRxiv ; 2023 Oct 02.
Artigo em Inglês | MEDLINE | ID: mdl-37873131

RESUMO

Though electronic health record (EHR) systems are a rich repository of clinical information with large potential, the use of EHR-based phenotyping algorithms is often hindered by inaccurate diagnostic records, the presence of many irrelevant features, and the requirement for a human-labeled training set. In this paper, we describe a knowledge-driven online multimodal automated phenotyping (KOMAP) system that i) generates a list of informative features by an online narrative and codified feature search engine (ONCE) and ii) enables the training of a multimodal phenotyping algorithm based on summary data. Powered by composite knowledge from multiple EHR sources, online article corpora, and a large language model, features selected by ONCE show high concordance with the state-of-the-art AI models (GPT4 and ChatGPT) and encourage large-scale phenotyping by providing a smaller but highly relevant feature set. Validation of the KOMAP system across four healthcare centers suggests that it can generate efficient phenotyping algorithms with robust performance. Compared to other methods requiring patient-level inputs and gold-standard labels, the fully online KOMAP provides a significant opportunity to enable multi-center collaboration.

10.
Artigo em Inglês | MEDLINE | ID: mdl-37907279

RESUMO

INTRODUCTION: We measured and compared five individual surrogate markers-change from baseline to 1 year after randomization in hemoglobin A1c (HbA1c), fasting glucose, 2-hour postchallenge glucose, triglyceride-glucose index (TyG) index, and homeostatic model assessment of insulin resistance (HOMA-IR)-in terms of their ability to explain a treatment effect on reducing the risk of type 2 diabetes mellitus at 2, 3, and 4 years after treatment initiation. RESEARCH DESIGN AND METHODS: Study participants were from the Diabetes Prevention Program study, randomly assigned to either a lifestyle intervention (n=1023) or placebo (n=1030). The surrogate markers were measured at baseline and 1 year, and diabetes incidence was examined at 2, 3, and 4 years postrandomization. Surrogacy was evaluated using a robust model-free estimate of the proportion of treatment effect explained (PTE) by the surrogate marker. RESULTS: Across all time points, change in fasting glucose and HOMA-IR explained higher proportions of the treatment effect than 2-hour glucose, TyG index, or HbA1c. For example, at 2 years, glucose explained the highest (80.1%) proportion of the treatment effect, followed by HOMA-IR (77.7%), 2-hour glucose (76.2%), and HbA1c (74.6%); the TyG index explained the smallest (70.3%) proportion. CONCLUSIONS: These data suggest that, of the five examined surrogate markers, glucose and HOMA-IR were the superior surrogate markers in terms of PTE, compared with 2-hour glucose, HbA1c, and TyG index.


Assuntos
Diabetes Mellitus Tipo 2 , Resistência à Insulina , Humanos , Diabetes Mellitus Tipo 2/epidemiologia , Diabetes Mellitus Tipo 2/prevenção & controle , Glicemia , Hemoglobinas Glicadas , Incidência , Biomarcadores , Glucose
11.
EClinicalMedicine ; 64: 102212, 2023 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-37745025

RESUMO

Background: Multisystem inflammatory syndrome in children (MIS-C) is a severe complication of SARS-CoV-2 infection. It remains unclear how MIS-C phenotypes vary across SARS-CoV-2 variants. We aimed to investigate clinical characteristics and outcomes of MIS-C across SARS-CoV-2 eras. Methods: We performed a multicentre observational retrospective study including seven paediatric hospitals in four countries (France, Spain, U.K., and U.S.). All consecutive confirmed patients with MIS-C hospitalised between February 1st, 2020, and May 31st, 2022, were included. Electronic Health Records (EHR) data were used to calculate pooled risk differences (RD) and effect sizes (ES) at site level, using Alpha as reference. Meta-analysis was used to pool data across sites. Findings: Of 598 patients with MIS-C (61% male, 39% female; mean age 9.7 years [SD 4.5]), 383 (64%) were admitted in the Alpha era, 111 (19%) in the Delta era, and 104 (17%) in the Omicron era. Compared with patients admitted in the Alpha era, those admitted in the Delta era were younger (ES -1.18 years [95% CI -2.05, -0.32]), had fewer respiratory symptoms (RD -0.15 [95% CI -0.33, -0.04]), less frequent non-cardiogenic shock or systemic inflammatory response syndrome (SIRS) (RD -0.35 [95% CI -0.64, -0.07]), lower lymphocyte count (ES -0.16 × 109/uL [95% CI -0.30, -0.01]), lower C-reactive protein (ES -28.5 mg/L [95% CI -46.3, -10.7]), and lower troponin (ES -0.14 ng/mL [95% CI -0.26, -0.03]). Patients admitted in the Omicron versus Alpha eras were younger (ES -1.6 years [95% CI -2.5, -0.8]), had less frequent SIRS (RD -0.18 [95% CI -0.30, -0.05]), lower lymphocyte count (ES -0.39 × 109/uL [95% CI -0.52, -0.25]), lower troponin (ES -0.16 ng/mL [95% CI -0.30, -0.01]) and less frequently received anticoagulation therapy (RD -0.19 [95% CI -0.37, -0.04]). Length of hospitalization was shorter in the Delta versus Alpha eras (-1.3 days [95% CI -2.3, -0.4]). Interpretation: Our study suggested that MIS-C clinical phenotypes varied across SARS-CoV-2 eras, with patients in Delta and Omicron eras being younger and less sick. EHR data can be effectively leveraged to identify rare complications of pandemic diseases and their variation over time. Funding: None.

12.
JAMA Intern Med ; 183(10): 1090-1097, 2023 10 01.
Artigo em Inglês | MEDLINE | ID: mdl-37603326

RESUMO

Importance: The US Food and Drug Administration (FDA) is building a national postmarketing surveillance system for medical devices, moving to a "total product life cycle" approach whereby more limited premarketing data are balanced with postmarketing surveillance to capture rare adverse events and long-term safety issues. Objective: To assess the methodological requirements and feasibility of postmarketing device surveillance using endovascular aneurysm repair devices (EVARs), which have been the subject of safety concerns, using clinical data from a large health care system. Design, Setting, and Participants: This retrospective cohort study included patients with electronic health record (EHR) data in the Veterans Affairs Corporate Data Warehouse. Exposure: Implantation of an AFX Endovascular AAA System (AFX) device (any of 3 iterations) or a non-AFX comparator EVAR device from January 1, 2011, to December 21, 2021. Main Outcomes and Measures: The primary outcomes were rates of type III endoleaks and all-cause mortality; and rates of these outcomes associated with AFX devices compared with non-AFX devices, assessed using Cox proportional hazards regression models and doubly robust causal modeling. Information on type III endoleaks was available only as free-text mentions in clinical notes, while all-cause mortality data could be extracted using structured data. Device-specific information required by the FDA is ascertained using unique device identifiers (UDIs), which include factors such as model numbers, catalog numbers, and manufacturer-specific product codes. The availability of UDIs in EHRs was assessed. Results: In total, 13 941 patients (mean [SD] age, 71.8 [7.4] years) received 1 of the devices of interest (AFX with Strata [AFX-S]: 718 patients [5.2%]; AFX with Duraply [AFX-D]: 404 patients [2.9%]; or AFX2: 682 patients [4.9%]), and 12 137 (87.1%) received non-AFX devices. The UDIs were not recorded in the EHR for any patient with an AFX device, and partial UDIs were available for 19 patients (0.1%) with a non-AFX device. This necessitated the development of advanced natural language processing tools to define the cohort of patients for analysis. The study identified a significantly higher risk of type III endoleaks at 5 years among patients receiving any of the AFX device iterations, including the most recent version, AFX2 (11.6%; 95% CI, 8.1%-15.1%) compared with that among patients with non-AFX devices (5.7%; 95% CI, 2.2%-9.2%; absolute risk difference, 5.9%; 95% CI, 2.3%-9.4%). However, there was no significantly higher all-cause mortality for any of the AFX device iterations, including for AFX2 (19.0%; 95% CI, 16.0%-22.0%) compared with non-AFX devices (18.0%; 95% CI, 15.0%-21.0%; absolute risk difference, 1.0%; 95% CI, -2.1% to 4.1%). Conclusions and Relevance: The findings of this cohort study suggest that clinical data can be used for the postmarketing device surveillance required by the FDA. The study also highlights ongoing challenges to performing larger-scale surveillance, including lack of consistent use of UDIs and insufficient relevant structured data to efficiently capture certain outcomes of interest.


Assuntos
Aneurisma da Aorta Abdominal , Implante de Prótese Vascular , Procedimentos Endovasculares , Humanos , Idoso , Prótese Vascular , Endoleak/etiologia , Correção Endovascular de Aneurisma , Aneurisma da Aorta Abdominal/etiologia , Aneurisma da Aorta Abdominal/mortalidade , Aneurisma da Aorta Abdominal/cirurgia , Estudos Retrospectivos , Estudos de Coortes , Resultado do Tratamento , Procedimentos Endovasculares/efeitos adversos , Procedimentos Endovasculares/instrumentação
13.
medRxiv ; 2023 May 21.
Artigo em Inglês | MEDLINE | ID: mdl-37293026

RESUMO

Objective: Electronic health record (EHR) systems contain a wealth of clinical data stored as both codified data and free-text narrative notes, covering hundreds of thousands of clinical concepts available for research and clinical care. The complex, massive, heterogeneous, and noisy nature of EHR data imposes significant challenges for feature representation, information extraction, and uncertainty quantification. To address these challenges, we proposed an efficient Aggregated naRrative Codified Health (ARCH) records analysis to generate a large-scale knowledge graph (KG) for a comprehensive set of EHR codified and narrative features. Methods: The ARCH algorithm first derives embedding vectors from a co-occurrence matrix of all EHR concepts and then generates cosine similarities along with associated p-values to measure the strength of relatedness between clinical features with statistical certainty quantification. In the final step, ARCH performs a sparse embedding regression to remove indirect linkage between entity pairs. We validated the clinical utility of the ARCH knowledge graph, generated from 12.5 million patients in the Veterans Affairs (VA) healthcare system, through downstream tasks including detecting known relationships between entity pairs, predicting drug side effects, disease phenotyping, as well as sub-typing Alzheimer's disease patients. Results: ARCH produces high-quality clinical embeddings and KG for over 60,000 EHR concepts, as visualized in the R-shiny powered web-API (https://celehs.hms.harvard.edu/ARCH/). The ARCH embeddings attained an average area under the ROC curve (AUC) of 0.926 and 0.861 for detecting pairs of similar EHR concepts when the concepts are mapped to codified data and to NLP data; and 0.810 (codified) and 0.843 (NLP) for detecting related pairs. Based on the p-values computed by ARCH, the sensitivity of detecting similar and related entity pairs are 0.906 and 0.888 under false discovery rate (FDR) control of 5%. For detecting drug side effects, the cosine similarity based on the ARCH semantic representations achieved an AUC of 0.723 while the AUC improved to 0.826 after few-shot training via minimizing the loss function on the training data set. Incorporating NLP data substantially improved the ability to detect side effects in the EHR. For example, based on unsupervised ARCH embeddings, the power of detecting drug-side effects pairs when using codified data only was 0.15, much lower than the power of 0.51 when using both codified and NLP concepts. Compared to existing large-scale representation learning methods including PubmedBERT, BioBERT and SAPBERT, ARCH attains the most robust performance and substantially higher accuracy in detecting these relationships. Incorporating ARCH selected features in weakly supervised phenotyping algorithms can improve the robustness of algorithm performance, especially for diseases that benefit from NLP features as supporting evidence. For example, the phenotyping algorithm for depression attained an AUC of 0.927 when using ARCH selected features but only 0.857 when using codified features selected via the KESER network[1]. In addition, embeddings and knowledge graphs generated from the ARCH network were able to cluster AD patients into two subgroups, where the fast progression subgroup had a much higher mortality rate. Conclusions: The proposed ARCH algorithm generates large-scale high-quality semantic representations and knowledge graph for both codified and NLP EHR features, useful for a wide range of predictive modeling tasks.

14.
J Biomed Inform ; 143: 104415, 2023 07.
Artigo em Inglês | MEDLINE | ID: mdl-37276949

RESUMO

Disease knowledge graphs have emerged as a powerful tool for artificial intelligence to connect, organize, and access diverse information about diseases. Relations between disease concepts are often distributed across multiple datasets, including unstructured plain text datasets and incomplete disease knowledge graphs. Extracting disease relations from multimodal data sources is thus crucial for constructing accurate and comprehensive disease knowledge graphs. We introduce REMAP, a multimodal approach for disease relation extraction. The REMAP machine learning approach jointly embeds a partial, incomplete knowledge graph and a medical language dataset into a compact latent vector space, aligning the multimodal embeddings for optimal disease relation extraction. Additionally, REMAP utilizes a decoupled model structure to enable inference in single-modal data, which can be applied under missing modality scenarios. We apply the REMAP approach to a disease knowledge graph with 96,913 relations and a text dataset of 1.24 million sentences. On a dataset annotated by human experts, REMAP improves language-based disease relation extraction by 10.0% (accuracy) and 17.2% (F1-score) by fusing disease knowledge graphs with language information. Furthermore, REMAP leverages text information to recommend new relationships in the knowledge graph, outperforming graph-based methods by 8.4% (accuracy) and 10.4% (F1-score). REMAP is a flexible multimodal approach for extracting disease relations by fusing structured knowledge and language information. This approach provides a powerful model to easily find, access, and evaluate relations between disease concepts.


Assuntos
Inteligência Artificial , Aprendizado de Máquina , Humanos , Unified Medical Language System , Idioma , Processamento de Linguagem Natural
15.
Arthritis Res Ther ; 25(1): 93, 2023 06 02.
Artigo em Inglês | MEDLINE | ID: mdl-37269020

RESUMO

BACKGROUND: Many patients with rheumatoid arthritis (RA) require a trial of multiple biologic disease-modifying anti-rheumatic drugs (bDMARDs) to control their disease. With the availability of several bDMARD options, the history of bDMARDs may provide an alternative approach to understanding subphenotypes of RA. The objective of this study was to determine whether there exist distinct clusters of RA patients based on bDMARD prescription history to subphenotype RA. METHODS: We studied patients from a validated electronic health record-based RA cohort with data from January 1, 2008, through July 31, 2019; all subjects prescribed ≥ 1 bDMARD or targeted synthetic (ts) DMARD were included. To determine whether subjects had similar b/tsDMARD sequences, the sequences were considered as a Markov chain over the state-space of 5 classes of b/tsDMARDs. The maximum likelihood estimator (MLE)-based approach was used to estimate the Markov chain parameters to determine the clusters. The EHR data of study subjects were further linked with a registry containing prospectively collected data for RA disease activity, i.e., clinical disease activity index (CDAI). As a proof of concept, we tested whether the clusters derived from b/tsDMARD sequences correlated with clinical measures, specifically differing trajectories of CDAI. RESULTS: We studied 2172 RA subjects, mean age 52 years, RA duration 3.4 years, and 62% seropositive. We observed 550 unique b/tsDMARD sequences and identified 4 main clusters: (1) TNFi persisters (65.7%), (2) TNFi and abatacept therapy (8.0%), (3) on rituximab or multiple b/tsDMARDs (12.7%), (4) prescribed multiple therapies with tocilizumab predominant (13.6%). Compared to the other groups, TNFi persisters had the most favorable trajectory of CDAI over time. CONCLUSION: We observed that RA subjects can be clustered based on the sequence of b/tsDMARD prescriptions over time and that the clusters were correlated with differing trajectories of disease activity over time. This study highlights an alternative approach to consider subphenotyping of patients with RA for studies aimed at understanding treatment response.


Assuntos
Antirreumáticos , Artrite Reumatoide , Produtos Biológicos , Humanos , Pessoa de Meia-Idade , Artrite Reumatoide/tratamento farmacológico , Antirreumáticos/uso terapêutico , Rituximab/uso terapêutico , Abatacepte/uso terapêutico , Produtos Biológicos/uso terapêutico
16.
J Biomed Inform ; 144: 104425, 2023 08.
Artigo em Inglês | MEDLINE | ID: mdl-37331495

RESUMO

OBJECTIVE: Electronic health records (EHR), containing detailed longitudinal clinical information on a large number of patients and covering broad patient populations, open opportunities for comprehensive predictive modeling of disease progression and treatment response. However, since EHRs were originally constructed for administrative purposes not for research, in the EHR-linked studies, it is often not feasible to capture reliable information for analytical variables, especially in the survival setting, when both accurate event status and event times are needed for model building. For example, progression-free survival (PFS), a commonly used survival outcome for cancer patients, often involves complex information embedded in free-text clinical notes and cannot be extracted reliably. Proxies of PFS time such as time to the first mention of progression in the notes are at best good approximations to the true event time. This leads to difficulty in efficiently estimating event rates for an EHR patient cohort. Estimating survival rates based on error-prone outcome definitions can lead to biased results and hamper the power in the downstream analysis. On the other hand, extracting accurate event time information via manual annotation is time and resource intensive. The objective of this study is to develop a calibrated survival rate estimator using noisy outcomes from EHR data. MATERIALS AND METHODS: In this paper, we propose a two-stage semi-supervised calibration of noisy event rate (SCANER) estimator that can effectively overcome censoring induced dependency and attains more robust performance (i.e., not sensitive to misspecification of the imputation model) by fully utilizing both a small-labeled set of gold-standard survival outcomes annotated via manual chart review and a set of proxy features automatically captured via EHR in the unlabeled set. We validate the SCANER estimator by estimating the PFS rates for a virtual cohort of lung cancer patients from one large tertiary care center and the ICU-free survival rates for COVID patients from two large tertiary care centers. RESULTS: In terms of survival rate estimates, the SCANER had very similar point estimates compared to the complete-case Kaplan Meier estimator. On the other hand, other benchmark methods for comparison, which fail to account for the induced dependency between event time and the censoring time conditioning on surrogate outcomes, produced biased results across all three case studies. In terms of standard errors, the SCANER estimator was more efficient than the KM estimator, with up to 50% efficiency gain. CONCLUSION: The SCANER estimator achieves more efficient, robust, and accurate survival rate estimates compared to existing approaches. This promising new approach can also improve the resolution (i.e., granularity of event time) by using labels conditioning on multiple surrogates, particularly among less common or poorly coded conditions.


Assuntos
COVID-19 , Neoplasias Pulmonares , Humanos , Registros Eletrônicos de Saúde , Calibragem , Análise de Sobrevida
17.
J Med Internet Res ; 25: e45662, 2023 05 25.
Artigo em Inglês | MEDLINE | ID: mdl-37227772

RESUMO

Although randomized controlled trials (RCTs) are the gold standard for establishing the efficacy and safety of a medical treatment, real-world evidence (RWE) generated from real-world data has been vital in postapproval monitoring and is being promoted for the regulatory process of experimental therapies. An emerging source of real-world data is electronic health records (EHRs), which contain detailed information on patient care in both structured (eg, diagnosis codes) and unstructured (eg, clinical notes and images) forms. Despite the granularity of the data available in EHRs, the critical variables required to reliably assess the relationship between a treatment and clinical outcome are challenging to extract. To address this fundamental challenge and accelerate the reliable use of EHRs for RWE, we introduce an integrated data curation and modeling pipeline consisting of 4 modules that leverage recent advances in natural language processing, computational phenotyping, and causal modeling techniques with noisy data. Module 1 consists of techniques for data harmonization. We use natural language processing to recognize clinical variables from RCT design documents and map the extracted variables to EHR features with description matching and knowledge networks. Module 2 then develops techniques for cohort construction using advanced phenotyping algorithms to both identify patients with diseases of interest and define the treatment arms. Module 3 introduces methods for variable curation, including a list of existing tools to extract baseline variables from different sources (eg, codified, free text, and medical imaging) and end points of various types (eg, death, binary, temporal, and numerical). Finally, module 4 presents validation and robust modeling methods, and we propose a strategy to create gold-standard labels for EHR variables of interest to validate data curation quality and perform subsequent causal modeling for RWE. In addition to the workflow proposed in our pipeline, we also develop a reporting guideline for RWE that covers the necessary information to facilitate transparent reporting and reproducibility of results. Moreover, our pipeline is highly data driven, enhancing study data with a rich variety of publicly available information and knowledge sources. We also showcase our pipeline and provide guidance on the deployment of relevant tools by revisiting the emulation of the Clinical Outcomes of Surgical Therapy Study Group Trial on laparoscopy-assisted colectomy versus open colectomy in patients with early-stage colon cancer. We also draw on existing literature on EHR emulation of RCTs together with our own studies with the Mass General Brigham EHR.


Assuntos
Neoplasias do Colo , Registros Eletrônicos de Saúde , Humanos , Algoritmos , Informática , Projetos de Pesquisa
18.
EBioMedicine ; 92: 104581, 2023 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-37121095

RESUMO

BACKGROUND: Rheumatoid arthritis (RA) shares genetic variants with other autoimmune conditions, but existing studies test the association between RA variants with a pre-defined set of phenotypes. The objective of this study was to perform a large-scale, systemic screen to determine phenotypes that share genetic architecture with RA to inform our understanding of shared pathways. METHODS: In the UK Biobank (UKB), we constructed RA genetic risk scores (GRS) incorporating human leukocyte antigen (HLA) and non-HLA risk alleles. Phenotypes were defined using groupings of International Classification of Diseases (ICD) codes. Patients with an RA code were excluded to mitigate the possibility of associations being driven by the diagnosis or management of RA. We performed a phenome-wide association study, testing the association between the RA GRS with phenotypes using multivariate generalized estimating equations that adjusted for age, sex, and first five principal components. Statistical significance was defined using Bonferroni correction. Results were replicated in an independent cohort and replicated phenotypes were validated using medical record review of patients. FINDINGS: We studied n = 316,166 subjects from UKB without evidence of RA and screened for association between the RA GRS and n = 1317 phenotypes. In the UKB, 20 phenotypes were significantly associated with the RA GRS, of which 13 (65%) were immune mediated conditions including polymyalgia rheumatica, granulomatosis with polyangiitis (GPA), type 1 diabetes, and multiple sclerosis. We further identified a novel association in Celiac disease where the HLA and non-HLA alleles had strong associations in opposite directions. Strikingly, we observed that the non-HLA GRS was exclusively associated with greater risk of the validated conditions, suggesting shared underlying pathways outside the HLA region. INTERPRETATION: This study replicated and identified novel autoimmune phenotypes verified by medical record review that share immune pathways with RA and may inform opportunities for shared treatment targets, as well as risk assessment for conditions with a paucity of genomic data, such as GPA. FUNDING: This research was funded by the US National Institutes of Health (P30AR072577, R21AR078339, R35GM142879, T32AR007530) and the Harold and DuVal Bowen Fund.


Assuntos
Artrite Reumatoide , Predisposição Genética para Doença , Humanos , Genótipo , Artrite Reumatoide/diagnóstico , Artrite Reumatoide/genética , Fatores de Risco , Fenótipo , Antígenos HLA/genética , Antígenos de Histocompatibilidade Classe II/genética , Cadeias HLA-DRB1/genética , Alelos
19.
Psychiatry Res ; 323: 115175, 2023 05.
Artigo em Inglês | MEDLINE | ID: mdl-37003169

RESUMO

Growing evidence has shown that applying machine learning models to large clinical data sources may exceed clinician performance in suicide risk stratification. However, many existing prediction models either suffer from "temporal bias" (a bias that stems from using case-control sampling) or require training on all available patient visit data. Here, we adopt a "landmark model" framework that aligns with clinical practice for prediction of suicide-related behaviors (SRBs) using a large electronic health record database. Using the landmark approach, we developed models for SRB prediction (regularized Cox regression and random survival forest) that establish a time-point (e.g., clinical visit) from which predictions are made over user-specified prediction windows using historical information up to that point. We applied this approach to cohorts from three clinical settings: general outpatient, psychiatric emergency department, and psychiatric inpatients, for varying prediction windows and lengths of historical data. Models achieved high discriminative performance (area under the Receiver Operating Characteristic curve 0.74-0.93 for the Cox model) across different prediction windows and settings, even with relatively short periods of historical data. In short, we developed accurate, dynamic SRB risk prediction models with the landmark approach that reduce bias and enhance the reliability and portability of suicide risk prediction models.


Assuntos
Serviço Hospitalar de Emergência , Tentativa de Suicídio , Humanos , Tentativa de Suicídio/psicologia , Reprodutibilidade dos Testes , Curva ROC
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...